Fundamentals of Fuzzy Clustering

نویسندگان

  • Rudolf Kruse
  • Christian Döring
  • Marie-Jeanne Lesot
چکیده

Clustering is an unsupervised learning task that aims at decomposing a given set of objects into subgroups or clusters based on similarity. The goal is to divide the data-set in such a way that objects (or example cases) belonging to the same cluster are as similar as possible, whereas objects belonging to different clusters are as dissimilar as possible. The motivation for finding and building classes in this way can be manifold (Bock, 1974). Cluster analysis is primarily a tool for discovering previously hidden structure in a set of unordered objects. In this case one assumes that a ‘true’ or natural grouping exists in the data. However, the assignment of objects to the classes and the description of these classes are unknown. By arranging similar objects into clusters one tries to reconstruct the unknown structure in the hope that every cluster found represents an actual type or category of objects. Clustering methods can also be used for data reduction purposes. Then it is merely aiming at a simplified representation of the set of objects which allows for dealing with a manageable number of homogeneous groups instead of with a vast number of single objects. Only some mathematical criteria can decide on the composition of clusters when classifying data-sets automatically. Therefore clustering methods are endowed with distance functions that measure the dissimilarity of presented example cases, which is equivalent to measuring their similarity. As a result one yields a partition of the data-set into clusters regarding the chosen dissimilarity relation. All clustering methods that we consider in this chapter are partitioning algorithms. Given a positive integer c, they aim at finding the best partition of the data into c groups based on the given dissimilarity measure and they regard the space of possible partitions into c subsets only. Therein partitioning clustering methods are different from hierarchical techniques. The latter organize data in a nested sequence of groups, which can be visualized in the form of a dendrogram or tree. Based on a dendrogram one can decide on the number of clusters at which the data are best represented for a given purpose. Usually the number of (true) clusters in the given data is unknown in advance. However, using the partitioning methods one is usually required to specify the number of clusters c as an input parameter. Estimating the actual number of clusters is thus an important issue that we do not leave untouched in this chapter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

OPTIMIZATION OF FUZZY CLUSTERING CRITERIA BY A HYBRID PSO AND FUZZY C-MEANS CLUSTERING ALGORITHM

This paper presents an efficient hybrid method, namely fuzzy particleswarm optimization (FPSO) and fuzzy c-means (FCM) algorithms, to solve the fuzzyclustering problem, especially for large sizes. When the problem becomes large, theFCM algorithm may result in uneven distribution of data, making it difficult to findan optimal solution in reasonable amount of time. The PSO algorithm does find ago...

متن کامل

Clustering of Fuzzy Data Sets Based on Particle Swarm Optimization With Fuzzy Cluster Centers

In current study, a particle swarm clustering method is suggested for clustering triangular fuzzy data. This clustering method can find fuzzy cluster centers in the proposed method, where fuzzy cluster centers contain more points from the corresponding cluster, the higher clustering accuracy. Also, triangular fuzzy numbers are utilized to demonstrate uncertain data. To compare triangular fuzzy ...

متن کامل

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters&#10 &#10In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy cluster...

متن کامل

A Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data

The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...

متن کامل

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

Water Quality Zoning of Rivers by the Technique of Fuzzy Clustering Analysis

Zoning the pollution of a river may be the first or even the most important step in water quality management. In order to resolve its pollution, fuzzy clustering analysis may be used whenever a composite classification of water quality incorporates mutiple parameters In such cases, the technique may be used as a complement or an alternative to comprehensive assessment. In fuzzy clustering ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007